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Abstract 

A connection between the theory of neural networks and cryptog- 
raphy is presented. A new phenomenon, namely synchronization of 
neural networks is leading to a new method of exchange of secret mes- 
sages. Numerical simulations show that two artificial networks being 
trained by Hebbian learning rule on their mutual outputs develop an 
antiparallel state of their synaptic weights. The synchronized weights 
are used to construct an ephemeral key exchange protocol for a secure 
transmission of secret data. It is shown that an opponent who knows 
the protocol and all details of any transmission of the data has no 
chance to decrypt the secret message, since tracking the weights is a 
hard problem compared to synchronization. The complexity of the 
generation of the secure channel is linear with the size of the network. 

PACS numbers: 87.18. Sn,89.70.+c 

The ability to build a secure channel is one of the most challenging fields 
of research in modern communication. Since the secure channel has many 
applications, in particular for mobile phone, satellite and internet-based com- 
munications, there is a need for fast, effective and secure transmission proto- 
cols |IJ. Here we present a novel principle of a cryptosystem based on a new 
phenomenon which we observe for artificial neural networks. 

The goal of cryptography is to enable two partners to communicate over 
an insecure channel in such a way that an opponent cannot understand and 



decrypt the transmitted message. In a general scenario, the message is en- 
crypted by the sender through a key E k and the result, the ciphertext, is 
sent over the channel. A third party, eavesdropping on the channel, should 
not determine what the message was. However, the recipient who knows the 
encryption key can decrypt the ciphertext using his private key D k . 

In a private-key system the recipient has to agree with the sender on a 
secret key E^, which requires a hidden communication prior to the transmis- 
sion of any message. In a public-key system, on the other side, the key Ek 
is published and a hidden communication is not necessary. Nevertheless, an 
opponent cannot decrypt the transmitted message since it is computationally 
infeasible to invert the encryption function without knowing the key D k . In 
a key-exchange protocol, both partner start with private keys and transmit 
- using a public protocol - their encrypted private keys which, after some 
transformations, leads to a common secret key. In most applications a public- 
key system is used which is based on number theory where the keys are long 
integers 0,0]. 

In this report we suggest a novel cryptosystem. It is a key-exchange 
protocol which does neither use number theory nor a public key, but it is 
based on a learning process of neural networks: The two participants start 
from a secret set of vectors Ek(0) and Dk(0) without knowing the key of their 
partner. By exchanging public information the two keys develop to a common 
time dependent key E k {t) = —D k (t), which is used to encrypt and decrypt 
a given message. An opponent who knows the algorithm and observes any 
exchange of information is not able to find the keys E k (t) and D k (t). Our 
method is based on a new phenomenon presented here: Synchronization of 
neural networks by mutual learning ||. 

Simple models of neural networks describe a wide variety of phenomena 
in neurobiology and information theory M, ^ Artificial neural networks 
are systems of elements interacting by adaptive couplings which are trained 
from a set of examples. After training they function as content addressable 
associative memory, as classifiers or as prediction algorithms. 

In this report we present a new phenomenon: Two feedforward networks 
can synchronize their synaptic weights by exchanging and learning their mu- 
tual outputs for given common inputs. Surprisingly, synchronization is fast; 
the number of bits required to achieve perfect alignment of the weights is 
lower than the number of components of the weights. After synchroniza- 
tion, the synaptic weights define the common time dependent private key 
E k (t) = —D k (t). With respect to possible applications we find that first, 
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tracking the weights of one of the networks by the opponent is a hard prob- 
lem. Although we were not able to find a mathematical proof, our simula- 
tions, in addition to arguments based on analytic results on neural networks, 
give clear evidence that our key exchange protocol is secure [Tj| . Second, the 



complexity of our cryptosystems scales linearly with the size of the network 
(=number of bits of the keys). In summary, from this new biological mecha- 
nism one can construct efficient encryption systems using keys which change 
permanently. 

This phenomenon, as well as the corresponding applications in crypto- 
graphy, can be extended to a system of several partners communicating with 



each other, as well as to other tasks relying on a secure channel fll8| . Since 
synchronization is a subject of recent research in neuroscience too J7], |9|, 
we believe that our bridge between the theory of neural networks and cryp- 
tography may help to understand communication between parts of biological 
neuronal or genetic networks. 

In the following we introduce and investigate a simple model which shows 
the properties sketched above. The architecture used by the recipient and 
the sender is a two-layered perceptron, exemplified here by a parity machine 
(PM) with K hidden units. More precisely, the size of the input is KN and 
its components are denoted by Xkj, k = 1, 2, K and j = 1, N. 
For simplicity, each input unit takes binary values, = ±1. The K binary 
hidden units are denoted by y±, y 2 , Vk- Our architecture is characterized 
by non-overlapping receptive fields (a tree), where the weight from the }th 
input unit to the kth hidden unit is denoted by and the output bit O is 
the product of the state of the hidden units (see Fig. |l|). For simplicity we 
discuss PMs with three hidden units K = 3. We use integer weights bounded 
by L, i.e. w^j can take the values —L, — L + 1, L. 

The secrete information of each of the two partners are the initial values 
for the weights, w'^ and wS, for the sender and the recipient, respectively. 
It consists of 6iV integer numbers, 3N of the recipient and 3N of the sender. 
Sender and recipient do not know the initial numbers of their partners, which 
are used to construct the common secret key. 

Now each network is trained with the output of its partner. At each 
training step, for the synchronization as well as for the encrypt ion/ decrypt ion 
step, a new common public input vector (xkj) is needed for both the sender 
and the recipient. For a given input, the output is calculated in the following 
two steps. In the first one, the state of the three hidden units, y^ R , k = 
1, 2, 3, of the sender and the recipient are determined from the corresponding 
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Figure 1: Architecture of the networks: 3iV input units x are transmitted by 
three weight vectors w to three hidden units y. The final output bit O is the 
product of the hidden units. 

fields 

JV 

Vk R = signE ™ S kj R x kj] (1) 
i=i 

In the case of zero field, J2 w kj R x kj — 0, the sender /recipient sets the hidden 
unit to 1/ — 1. In the next step the output s I R is determined by the product 
of the hidden units, 5//R = yf y^ R - 

The sender is sending its output (one bit) to the recipient, the recipient 
is sending its output to the sender and both networks are trained with the 
output of its partner. In case that they do not agree on the current output, 
qSqR < the weights of the sender /recipient are updated according to the 
following Hebbian learning rule |6|, [IIJ . 



if {O s ' R y S k ,R > 0) then w%* = w% R - O*'* 
if (\ w kj I > L ) then w kj R = sign(wf/ fi ) L (2) 

Only weights belonging to the one (or three) hidden units which are in 
the same state as that of their output unit are updated, in each one of the 
two networks. Note that by using this dynamical rule, the sender is trying 
to imitate the response of the recipient and the recipient is trying to imitate 
the one of the sender. 

There are three main ingredients in our model which are essential for a 
secure key exchange protocol: First, from the knowledge of the output, the in- 
ternal representation of the hidden units is not uniquely determined because 
there is a four fold degeneracy (for the output +1 there are four internal repre- 
sentations for the three hidden units (1, 1, 1), (1,-1,-1), (—1,1,-1), (—1,-1,1)). 
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As a consequence, an observer cannot know which of the weight vectors is 
updated according to equation (0). Second, we have chosen the parity ma- 
chine since in the case of static weights, it is known that an opponent cannot 
obtain any knowledge about the rule if he is trained with less than a(L)N 
random examples (where at(L) = 2.63 for L = 3 and for more details see [|TTH ). 
This analytic result favours the PM over other multilayer networks. Third, 
since each component is bounded by L, an observer cannot invert the sum 
of equation ([|); the network forgets fl2f . As a consequence of these three in- 



gredients, the initial weight vectors cannot be recovered from the knowledge 
of the time dependent synchronized keys. All three of these mechanisms - 
hidden units, PM as well as bounded weights - make the problem hard for 
any observer. 

We find that the two PMs learning from each other are able to synchro- 
nize, at least for some parameters K, L and N |T3l. Our simulations show 



that after an initial relatively short transient time the two partners align 
themselves into antiparallel states. It is easy to verify from our learning rule 
that as soon as the two networks are synchronized they stay so forever. The 
number of time steps to reach this state depends on the initial weight vectors 
and on the sequence of random inputs, hence it is distributed. Fig. ^| shows 
the distribution of synchronization time obtained from at least 1000 samples. 
It is evident that two communicating networks synchronize in a rather short 
time. The average synchronization time t av decreases with increasing size N 
of the system, see Fig. ^ it seems to converge to t av ~ 410 for infinitely large 
networks. Surprisingly, in the limit of large N one needs to exchange only 
about 400 bits to obtain agreement between 3N components. However, one 
should keep in mind that the two partners do not learn the initial weights of 
each other, they just are attracted to a dynamical state with opposite weight 
vectors. 

As soon as the weights of the sender and the recipient are antiparallel the 
public initialization of our private-key cryptosystem is terminated success- 
fully and the encryption of the message starts. Now there are two possibilities 
to choose an algorithm: First, use a conventional encryption algorithm, for 
example a stream cipher like the well-known Blum-Blum-Shub bit generator 
In this case the seed for this pseudo-random number generator is con- 
structed from our weight vector after synchronization. Second, use the PM 
itself for a stream cipher by multiplying its output bit with the corresponding 
bit of the secret data. 

In the case of the PM, the complexity of the encrypt ion/ decrypt ion pro- 
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Figure 2: Distribution of synchronization time t sync for three sizes N of the 
two networks. 



cesses scales linearly with the size of the transmitted message, whereas the 
complexity of the synchronization process does not scale with the size of the 
network. Hence our construction is a linear cryptosystem [I| . 

Now we examine a possible attack on our cryptosystem. The opponent 
eavesdropping on the line knows the algorithm as well as the actual mutual 
outputs, hence he knows in which time steps the weights are changed. In 
addition, the opponent knows the input Xkj as well. However, the opponent 
does not know the initial conditions of the weights of the sender and the 
recipient. As a consequence, even for the synchronized state, the internal 
representations of the hidden units of the sender and the recipient are hidden 
from the opponent and he does not know which are the weights participating 
in the learning step. For random inputs all four internal representations 
appear with equal probability in any stage of the dynamical process, hence 
for t training steps there are 4* possibilities to select internal representations. 

Therefore, on the time scale of synchronization the observing network 
has no chance to obtain complete knowledge about the other two networks. 
We have simulated attacks of an observer, assuming that the most effective 
algorithm is a network which has identical architecture to the recipient: A 
PM with the same learning rule and parameters as described above. The 
observing network is trained with the input vector and output bit of the 
sender, and the training step (0) is performed only if sender and recipient 
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Figure 3: Average synchronization time as a function of 1/N, for system size 



N = 11,21,51,101,1001. 



disagree with each other. Note that the sender does not react to the output 
of the opponent, which results in a large noise to signal ratio compared with 
the recipient f!8 |. 

The learning rule (0) may be considered for each component of the weight 
vectors as a kind of biased random walk with reflecting boundaries. There- 
fore, for very long times, the observer may take the weight vector of the other 
network by chance. The distribution of the ratio between the time t sync the 
sender and recipient need to synchronize and the learning time ti earn the 
opponent needs for complete overlap is shown in Fig. [|. For N = 101 the 
average learning time is a factor of about 125 larger than the corresponding 
synchronization time. In addition, with increasing system size the tail of the 
distribution for larger ratios is reduced. 

Hence the time to synchronize by chance is very long and in the example 
discussed here it is of order O(10 5 ) The heart of our cryptosystem is 

that synchronization is a much simpler task than tracking by an observer. 

This principle is also supported by the following observation. Assume 
that the observer has already some knowledge about one of the networks, 
he knows M out of N components for each of the three hidden units. We 
have measured whether the observer succeeds to synchronize within 2000 
time steps - a time much longer than the average synchronization time for 
the two communicating networks. For M = N the observer is parallel to 
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Figure 4: Frequency of the ratio r between synchronization and learning 
times. 

one of the networks and remains so forever. But already for M = N — 1 
we find a high probability that the weight vector of the observer separates 
completely from its almost parallel alignment. For smaller values of M this 
probability decreases fast to zero. Surprisingly, even in case the observer has 
almost complete knowledge about the two partners, he does not succeed to 
achieve complete information from learning examples. This fact reduces the 
probability of the opponent to imitate one of the communicating networks 
using an ensemble of PMs. 

Our key exchange protocol can be generalized to include Bit-Packages 
as is briefly described below. An important issue for the implementation of 
our cryptosystem is to accelerate the synchronization process from hundreds 
of time steps to a few dozens while keeping the security of our channel. 
Surprisingly, both of these two goals can be achieved simultaneously sending 
bit packages (BP). In this scenario the process contains the following steps: 
(a) The sender and the recipient generate B > 1 common inputs, (b) The 
sender and the recipient calculate the output of their PM for the set of B 
inputs and store the B sets of the corresponding values yu (i — 1, B) of the 
hidden units (the internal representations) (c) The transmission of mutual 
information; the sender/recipient sends a package consisting of B bits (bf^ R ) 
to the recipient/sender, (d) The sender and the recipient are updating their 
weights using the same learning rule as for B = 1: In case that bit bf ^ bf 
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Figure 5: Total number of transmitted bits until synchronization. B is the 
number of bits in each bit package exchanged between sender and recipient. 



the learning process is taking place as before using the corresponding internal 
representations. The synchronization time is dramatically reduced, as is 
shown in Fig. 5. For instance, for N = 21, K = 3, L = 3, synchronization is 
achieved after 12 bit packages if the size of the package is larger than B > 32. 

Other extensions of our method as well as the analytical calculation of 
the distribution of t sync , t\ eaTn for various K, L and continuous weights, and 
version space of the PMs which are consistent with the training set will be 
dicussed in 
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Finally, we want to remark that synchronization is a subject of recent 
research in neuroscience, where for instance, in experiments on cats and 
monkeys one has found that the spike activity of neurons in the visual cortex 
has correlations which depend on the kind of optical stimulus shown to the 
animal [|17|]. The phenomenon described here suggests that synchronization 
can be used by biological neuronal networks or by networks of the immune 
system to exchange secure information between different parts of an organ- 
ism. 
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